An Accelerated Gradient Method for Distributed Multi-Agent Planning with Factored MDPs
نویسندگان
چکیده
We study optimization for collaborative multi-agent planning in factored Markov decision processes (MDPs) with shared resource constraints. Following past research, we derive a distributed planning algorithm for this setting based on Lagrangian relaxation: we optimize a convex dual function which maps a vector of resource prices to a bound on the achievable utility. Since the dual function is not differentiable, the most common method for optimizing it is subgradient descent. This method is appealing, since we can compute the subgradient by asking each agent to plan independently of the others using the current resource prices; however, subgradient descent unfortunately requires O( −2) iterations to achieve accuracy , and therefore the overall Lagrangian relaxation algorithm can have trouble scaling to realistic domains. So, instead, we propose to optimize a smoothed version of the dual function via a fast proximal gradient algorithm. By trading the error caused by smoothing against the faster convergence of the proximal gradient method, we demonstrate that we can obtain faster (O( −1)) convergence of the overall Lagrangian relaxation. Furthermore, we propose a particular smoothing method, based on maximum causal entropy, for which the subgradient calculation remains simple and efficient.
منابع مشابه
Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes
We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the “localization” concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with...
متن کاملMultiagent Planning with Factored MDPs
We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we as...
متن کاملMemory-Effcient Symbolic Online Planning for Factored MDPs
Factored Markov Decision Processes (MDP) are a de facto standard for compactly modeling sequential decision making problems with uncertainty. Offline planning based on symbolic operators exploits the factored structure of MDPs, but is memory intensive. We present new memoryefficient symbolic operators for online planning, prove the soundness of the operators, and show convergence of the corresp...
متن کاملDistributed Market-Based Algorithms for Multi-Agent Planning with Shared Resources
We propose a new family of market-based distributed planning algorithms for collaborative multi-agent systems with complex shared constraints. Such constraints tightly couple the agents together, and appear in problems ranging from task or resource allocation to collision avoidance. While it is not immediately obvious, a wide variety of constraints can in fact be reduced to generalized resource...
متن کاملOnline Symbolic Gradient-Based Optimization for Factored Action MDPs
This paper investigates online stochastic planning for problems with large factored state and action spaces. We introduce a novel algorithm that builds a symbolic representation capturing an approximation of the action-value Q-function in terms of action variables, and then performs gradient based search to select an action for the current state. The algorithm can be seen as a symbolic extensio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011